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Abstract — The sparse representation problem of recovering an 
TV dimensional sparse vector x from M < N linear observations 
y — Dx given dictionary D is considered. The standard ap- 
proach is to let the elements of the dictionary be independent and 
identically distributed (IID) zero-mean Gaussian and minimize 
the ii-norm of x under the constraint y — Dx. In this paper, the 
performance of h -reconstruction is analyzed, when the dictionary 
is bi-orthogonal D — [0± 2 ], where Oi,0' 2 are independent 
and drawn uniformly according to the Haar measure on the 
group of orthogonal M x M matrices. By an application of the 
replica method, we obtain the critical conditions under which 
perfect h -recovery is possible with bi-orthogonal dictionaries. 

I. Introduction 

The sparse representation (SR) problem has wide applica- 
bility, for example, in communications JT], (2), multimedia 
(3), and compressive sampling (CS) pj, (5). The standard SR 
" mKI ' >m ,h » <•"■■»■<••■>" >- ~ T " N that is the solution to 



problem is to find the sparsest x € 
the set of M < N linear equations 



y = Dx, (1) 

for a given dictionary or sensing matrix D 6 ^MxN an( j 
observation y. Finding such x is, however, non-polynomial 
(NP) hard. Thus, a variety of practical algorithms have been 
developed that solve the SR problem sub-optimally. The topic 
of the current paper is the convex relaxation approach where, 
instead of searching for the x having the minimum Zo-norm, 
the goal is to find the minimum Zi-norm solution of ([TJ. 

Let K be the number of non-zero elements in x and assume 
that the convex relaxation method is used for recovery. The 
trade-off between two parameters p = K/N and a = M/N is 
then of special interest since it tells how much the sparse signal 
can be compressed under ^-reconstruction. An interesting 
question then arises: How does the sparsity-undersampling 
(p vs. a) trade-off depend on the choice of dictionary Dl 

The empirical study in (6] Sec. 15 in SI] gave evidence 
that the worst case p vs. a trade-off is quite universal w.r.t 
different random matrix ensembles. Analysis in (7j further 
revealed that the typical conditions for perfect Zi-recovery 
are the same for all sensing matrices that are sampled from 
the rotationally invariant matrix ensembles. Dictionaries with 
independent identically distributed (IID) zero-mean Gaussian 
elements is one example of this. But correlations in D can 
degrade the performance of Zi -recovery (8), so it is not fully 
clear how the choice of D affects the p vs. a trade-off. 



Besides the random / unstructured dictionaries mentioned 
above, the information theoretic approach in [9| encompasses 
more general matrix ensembles but does not consider the l\- 
reconstruction limit. Several studies in the literature have also 
considered the specific construction where D is formed by 
concatenating two orthogonal matrices fT0)-fl4). Such bi- 
orthogonal dictionaries are easy to implement and can give 
elegant theoretical insights. Unfortunately, the "mutual coher- 
ence" based methods used in these papers provide pessimistic, 
or worst case, thresholds. Furthermore, the result are not easy 
to compare between the unstructured and bi-orthogonal cases. 

We consider the analysis of the bi-orthogonal SR setup 



y 



Dx = [Oi O a ] 



X 2 



Oi^i + 2 x 2 , 



(2) 



where the dictionary is constructed by concatenating two 
independent matrices 0\ and 2 , that are drawn uniformly 
according to the Haar measure on the group of all orthogonal 
M x M matrices. We use the non-rigorous replica method 
(see, e.g., (7J, [fl"5]-| 17 1 for related works) to assess p for a 
given a, up to which the Zi-recovery is successful. This allows 
a direct comparison between the random and bi-orthogonal 
dictionaries in average or typical sense. The main result of 
the paper is the sparsity-undersampling trade-off for the bi- 
orthogonal SR setup pi. We find that this matches the unstruc- 
tured IID Gaussian dictionary when the non-zero components 
are uniformly distributed between the two blocks. Surprisingly, 
when the non-zero components are concentrated more on 
one block than the other, the bi-orthogonal dictionaries can 
cope with higher overall densities than the unstructured case. 
This extends to the case of general T-concatenated orthogonal 
dictionaries as reported elsewhere fl8) . 

II. Problem Setting 
Consider the SR problem of finding the sparsest vector x — 



\x\ x 



TlT 



pJV 



given the dense vector y £ 



and the 

dictionary D = [Oi 2 ] € R MxN . By definition M/N = 
1/2 and OjOi = Im for this setup. Let K\ and K 2 be the 
number of non-zero elements in Si and x 2 , respectively, so 
that K = K\ + K 2 is the total number of non-zero elements 
in x. Denote p = K/(2M) for the overall sparsity of the 
source while p\ — K\/M and p 2 = K 2 /M represent the 
signal densities of the two blocks. 



It is important to note that D in |2} does not belong 
to the rotationally invariant matrix ensembles [7], and there 
are complex dependencies between the elements due to the 
orthogonality constraints. The fact that 0\0 2 7^ makes 
the analysis of the setup highly non-trivial (for a sketch, see 
Appendices [A] and [B}. Thus, only the bi-orthogonal case is 
considered here and the analysis of general T-concatenated 
orthogonal dictionaries is reported elsewhere |T8) . 

The system is assumed to approach the large system limit 
M, Ki , K 2 —> 00 where the signal densities pi , p 2 are finite 
and fixed. We let {xi}f =1 be independent sparse random 
vectors whose components are IID according to 



Pi {x) = (1 - Pi )5(x) + Pz e- X '^/V2TT, i = 1, 2 



(3) 



The convex relaxation of the original problem is considered 
and the goal is to find x = [xj xJ] T that is the solution to 



mm ||£Ei||i 



\x 2 \\i s.t. y = 0\x x + 2 x 2 



(4) 



Note that we do not consider the weighted li -reconstruction 
analyzed for the rotationally invariant D in p3] . This corre- 
sponds to the scenario where the user has no prior knowledge 
about the relative statistics of the data blocks. In the next 
section we find the typical density p — (pi + p 2 )/2 for which 
perfect ^-reconstruction is possible under the constraint (|2j. 

III. Analysis 
Let the postulated prior of the sparse vector Xi be 

qp{Sn) =e-^lli, i = l,2, (5) 

where the components of Xi £ M M are IID. The inverse 
temperature (3 is a non-negative parameter. Let qp{x) = 
qp(xi)q/3(x 2 ) be the postulated prior of x in and define 
a mismatched posterior mean estimator 

(x) p = Z p (y, D)- 1 J x8(y - Dx)q fj (x)dx. (6) 

Here Zp(y, D) = J 6(y — Dx)qp(x)dx, acts as the partition 
function of the system. Then, the zero-temperature estimate 
(tyg-yoo is a solution (if at least one exists) to the original 
li -minimization problem Q. 

Utilizing of one of the standard tools from statistical 
physics, namely the non-rigorous replica method, we study 
next the behavior of the estimator (|5J. We accomplish this by 
examining the so-called free energy density f of the system in 
the thermodynamic limit N — > 00. As a corollary, we obtain 
the critical compression threshold for the original optimization 
problem Q when f3 — > 00. 

A. Free Energy 

As sketched in Appendix [A] the free energy density related 
to (|6]l reads under the replica symmetric (RS) ansatz 

./>s = -\ lim i Hm ~ lim ^ log E ytD {Z%(y, D)} 



1 2 

- cextr Vt(6A 



(7) 



where 



Pi - 2rrii + Qi Q t Qi 



X1X1 



raimi 



4x* 2 2 

+ / (1 - pi)(f>(z^i; Qi) + p l (f>{zJm 2 t +x l ; Q l )Dz, (8) 



Qi = {QiiXi^iiQiiXi^i) i s a set °f parameters that take 
values on the extended real line, Dz = (27r) _1 / 2 e -2 / 2 dz is 
the Gaussian measure and 



<j){h] Q) = Ya\n{Qx/2 - hx + \x\}. 



(9) 



In contrast to, e.g., J7j, p5) , here cextre5(6) is constrained 
extremization over the function g(&) when xi — X2, needs to 
be satisfied. 

Remark 1. If the dictionary is sampled from the rotationally 
invariant matrix ensembles, the RS free energy density reads 

2 



frs 



1 

2{e 



extr > 



1 Pi - 2m l + Qj Q l Q t XiXi 
V 2 ELi Xi " 2 2 



+ J (1 - Pi)4>(zy/Xi; Qi) + Pi<t>{z\Jrrii + Xi\ ft)Dzj , (10) 

where cxtr is an unconstrained extremization w.r.t {Oi, & 2 }- 

B. Constrained Extremization 

Let us denote Q(x) = Dz for the Q-function and define 



r(»)-^.-A_ (1 + kXJ (^ 



(11) 



After solving the integrals and the optimization problem in 
d9j, the function ^ becomes 



Pi - 2m, + Qi QiQi 



XiXi 



4X: 
1 - Pi 



rtiimi 



r (Xi) + pM™ 2 +Xi)- 



(12) 



Introducing the Lagrange multiplier -q for the constraint \i — 
X2, an alternative formulation for the free energy density reads 

f rs = \ extr { v ( X i - X2) + r(0O + T(6 2 )}, (13) 

where the extremization is now an unconstrained problem. 
Taking partial derivatives w.r.t all optimization variables and 
setting the results to zero yields the identities 

Qi = rhi and Xi = i=l,2. 



2rrii 



(14) 



We also find that the expressions 



1 _ 2 

rhi rhi 



2(1 - Pi )Q 



1 

fx-< 



2 Pl Q 



(15) 



Pi — 2rrii + Qi d 
&= ^ ^ 

are satisfied by the extremum of (JT3J. Under perfect recon- 
struction in mean square error (MSE) sense (see, e.g., (7), 



JT5) for details), we have p. t = Qj = mi and rfij 
X% — > 0. Hence, (|T5j simplifies to the condition 

1 \ 1 
2' 



oo 



2(1 - Pi )Q 



Pi 



(17) 



'Xi 

On the other hand, omitting the terms of the order 0(l/m 3 ) 
we have from the partial derivatives of Q\ and to, 

2p 4 2(1 - pC 



Qi = pi 



m l = pi 



TO^V 27T 

rii. 



2tt 



(18) 
(19) 



respectively, where we used ( fl4] > to simplify the expressions. 
Plugging the above to ( fT6] > and using again ([14]) yields 



X* = (-l)S + 2ft(l + - 4(1 - ft)r(x<)- 



(20) 



Before stating the final result, let us introduce a real 
parameter p e [0, 1] and assume without loss of generality 
that pi = pp 2 . Then the per-block densities can be written as 

2p 2 
Pi = 7~ — P and p 2 = — — p, (21) 
1 + p 1 + p 

where p = p(p) is the overall density of the source. The 
parameter p determines thus how uniformly the non-zero 
components are distributed between the two blocks: p = 1 
means fully uniformly, p = implies that all non-zero 
components are in the second block. 

Main Result. Let x e R 2A1 , D e R Mx2M and y = Dx 

as in |2]). Given the parameter p £ [0, 1], the typical density 
p{p) of the solution to the optimization problem 



arg mm 

x=[xi CC2] T GR 



^11 



x 2 i s.t. y 



Dx, 



is determined in the large system limit by the solutions of the 
following set of coupled equations 

-l -2 



Xi 



X2 



Q- 1 

1 + p 
Ap 



2pp 



1 



p 



1 



'Xi 



1 



(1 



p 

~P) 



l + Xi + 2r(xi)] -4r(xi)-Xi, 
l + X2+2r( X2 )] -4r(x 2 )+r7, 



2Q 



VX2 



4Q 



X X2 



(22) 
(23) 
(24) 
(25) 



where Q _1 is the functional inverse of the Q-function. For 
uniform sparsity, that is, p = 1 and p\ = p2, we have r) = 0, 
Xi = X2 and Xi = X2 always. The critical density is thus the 
same as for the dictionary that is drawn from the ensemble of 
rotationally invariant matrices. 

C. Numerical Examples 

Given the dictionary D is drawn from the ensemble of 
rotationally invariant matrices, the critical density for l\- 
recovery is known to be independent of the block densities 
{pi,p 2 } and given by p = 0.19284483309074016. . . for all 
p E [0,1]. For the bi-orthogonal D, the threshold is the same 



0.23 r 




Fig. 1. Critical density for bi-orthogonal and rotationally invariant D. The 
parameter fi £ [0, 1] determines how uniformly the non-zero components are 
distributed between the two blocks (p = 1 fully uniform, fj, = all non-zero 
components are in the second block). The user has no knowledge about fi. 
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Fig. 2. Critical density given /i = 0, that is, pi = 0, p2 = 2p for finite sized 
systems. Here 'R' means rotationally invariant D and 'O' the bi-orthogonal 
case. Each point is averaged over 10 6 realizations of the optimization problem. 
The filled markers at x = are the predictions given by the replica analysis. 

only for the case of uniform sparsity p = 1. For general p 
we obtain different thresholds, as plotted in Fig. [T] Note that 
p(p) is a decreasing function of p, implying that the more 
concentrated the non-zero components are in one block, the 
bigger the benefit of using the bi-orthogonal dictionary. We 
also carried out numerical simulations for the IID Gaussian 
and bi-orthogonal D using 'linprog' from Matlab Optimiza- 
tion Toolbox. The results are plotted in Fig. [2] where for each 
value of N = 16, 18, ... , 50, there are 1(T realizations of 
the SR problem. Cubic curves are fitted to the data using 
nonlinear least-squares regression. The critical density for the 
bi-orthogonal case is predicted by the replica method to be 
p(0) = 0.22666551758496698. . . and we observe that the 
simulations match the analysis up to the third decimal place. 

IV. Conclusions and Discussion 

The sparsity-undersampling trade-off for the bi-orthogonal 
SR setup |2]) was studied. For uniformly distributed non-zero 
components, there is no difference in compression ratio if we 
replace the rotationally invariant dictionary D 6 ^Mx2M 
a concatenated matrix D = [0\ 2 ] <E R Mx2M , where 
Oi,0 2 are independent and drawn uniformly according to 
the Haar measure on the group of all orthogonal M x M 



matrices. For non-uniform block sparsities, however, the bi- 
orthogonal dictionaries were found to be beneficial compared 
to the unstructured random dictionaries. 

Appendix A 
Free Energy 

Following jTJ, 1 15 1, we use the replica trick and write the 
free energy density as 



1 



lim 



lim 



lim 



1 



log^ 



-(«) 



2 /3 u-To du MToo M "°^/3. m 

where denoting Aa;' a ' = £c[°' — xf^ , a = 0, 1, . . . , u, 
1 



= E lira 



[a) i|2 



>0+ T ~ 



(26) 



X 



(27) 

For i = 1, 2, the vectors {a;^ }" =1 are IID conditioned on D 
and have the same density |5]l as x,. Furthermore, the elements 
of the vectors xf^ and xJ, are independently drawn according 
to pi and p 2 as given in and X = {xf\ x 2 a ^}^ =0 . 

Let us concentrate on hL™]^ and the inner expectation in 
( |27| i, which is over the orthogonal matrices 0\ and 2 given 
X. Since Oi are orthogonal, the average affects only the cross- 
terms of the form (u^) T u^ where uf^ = OilS.xf\ Define 
matrices Si £ R uxtl for i — 1,2, whose (a, 6)th element 



S [a,b] = q[0,0] 



^[0,6] _ g[a,0] 







a,b] 



i = l,2 (28) 



is the empirical covariance between the elements of Aa;[ a ' and 
Axf\ written in terms of the empirical covariances 



Qi 



a,b] 



M- x (xr)--t 



i 1 



a, b = 0, 1, . . . , u. 



(29) 



between the components of the ath and foth replicas of a;,. For 
analytical tractability, we make the standard replica symmetry 
(RS) assumption on the correlations ( p9) , i.e., r*j = Qjf' , 
m, = Qf' b] = Q[ a ' 0] Va,fe > 1, Qi = Q [ ?' a] Va > 1 and 
qi = Q [ t b] Va ^ 6 > 1. The RS free energy density is denoted 
f rs and we remark that it does not match / if the system is 
replica symmetry breaking. Under the RS assumption, 



Si — s\ ' 



(4 MI - 



1 = 1,2, (30) 



where l u £ M." is the vector of all-ones, and we may write 
the inner expectation in ( |27] i as 

e _^ (s [^ +41 a] )E | e _i E ^ i(M M )Tt 

Using Lemma [2] and taking the limit r — > + leads to 



A' 



}■ 



(31) 



which is an undesired result. To keep G" and the free energy 
density finite as r — > + , we pose the constraints 



sf 11 - 51 



1,2] 



1.2] 



1,1] 



[1,2] 
2 

[1,2] 



US'. 



1.21 



(34) 
(35) 



on the elements of the replica symmetric matrices Si,S2- 
Given ( f34| > and <j35j are satisfied, we get in the limit r —> + 
the expression for G 1 -"' = + G2 m terms of 



G\ u} = - log (Qi -qi + u(r l - 2rrii + %)) 



+ - 



log(Qj - ft), 



1,2. 



(36) 



Comparing ( |36[ l to [7, eq. (A. 4)] reveals that the corresponding 
terms for rotationally invariant and bi-orthogonal D match 
up to vanishing constants. Furthermore, in the limit u — > 
the equalities <J34|> and p5] l are equivalent to the condition 
Xi — X2, where we denoted Xi = — 9i) f° r notational 
convenience. This provides the relevant constraint for the 
evaluation of the RS free energy, as stated in Section |III-A| 

The next task would be to average ( |32"j > over the correlations 
( p9] l using the theory of large deviations and saddle-point 
integration. But since the effect of the bi-orthogonal sensing 
matrix D has been reduced to the above constraint, we omit 
the calculations here due to space constraints. For details, see 
p\ Appendix A] and fl8)- 



Appendix B 
Matrix Integration 



Lemma 1. Let 0\ and O2 be independent and drawn 
uniformly according to the Haar measure on the group of 
all orthogonal M X M matrices as in |2}. Given vectors 



xx,x 2 £ U M , denote \\xi\\ 2 = Mr ir for i = l,2. Then 



Im{ti,t2\c) = E 0l ,o 2 e 



ex, O, x 2 2 



(37) 



where c £ R and vectors Mi,« 2 £ K M are independent and 
uniformly distributed on the hyper-spheres at the boundaries 
of M dimensional balls with radiuses R\ = y/Mr\ and R2 = 
\J Mr 2, respectively. Furthermore, 

F(r ll r 2 - 1 c)= lim M~ x \ogI M {r\, r 2 ; c) 

M->oo 



a/1 + 4c 2 rir 2 1, ( 1 + \/T+ Ac 2 r 1 r 2 
n n lo S 



\J c 2 nr 2 — log(c 2 rir 2 )/4, for c 2 rir 2 ~> 1. 



(38) 
(39) 



-MG 1 - 



n 



-,8(11x^111 + 114' 



>o+ Gt . The function Gr given in d33 



where G^ u ' = lim 

at the top of the next page is implicitly a function of both S\ 
and S , 2- To obtain ( |33| ) we first used ( |45] l, then applied ( f39] >. 
Finally, some algebraic manipulations give the reported result. 

The problem with the limit G {u) = lim r ^ + G { r u) is that 
it diverges and the free energy density grows without bound 



Proof: Let Ui = OiXi where {xi}f =1 are fixed and 
{Oi}| =1 independent and drawn uniformly according to the 
l 'dajj da;! > (^2) Haar measure on the group of all orthogonal M x M matrices. 

Since ||u.i|| 2 = Mri and O; rotate the vectors Ui uniformly in 
all directions, it, is uniformly distributed on the hyper-sphere 
at the boundaries of an M dimensional ball having radius 
Ri = \JMti, providing the second equality in ( |3~7| ). 

To assess the second part of the lemma, the joint measure 
of (uijUz) reads p(u%; r{)p(u2\ r2)duidti2, where 



p(u;r) = Z(r)" 1 (5(||M|| 2 - M). 



(40) 



log (S^-S^+uST^^-S 



[l,2] Uc [l,l] _ Qp.,2] 



u - 1 



togf^-srp 1 )^- 11 -^ 



[l,2h 



, (33) 



The normalization constant Z(r) in ( |40[ > is the volume of the 
hypersphere in which u is constrained to. Using Stirling's 
formula for large M, we get up to a vanishing term 0(1 /M) 

Z{r) = (27rer) M / 2 /\/^. 



(41) 



7 e 



(42) 



With the help of Laplace transform, we write 

Six — a) — - — ; 

Am 

" i *■ — ■ 

so that using ( |4"0] > - ( |4"2| ), the latter expectation in < [37] > becomes 

(47Ti)- 2 



Z(ri,r 2 ) 

(4i) _2 Vnr^ 



e cuIt te -EU(ll«ill a -Wn).«/2jJ dUidflj 

i=l 

dsids 2 , 



s 1 r 1 +s 2 r 2 



ne M( nr2 )M/2 J ( Sl s 2 - C 2)M/2~ U ^' (43) 

where we used Gaussian integration to obtain ( |4"3j ). Since 
M — > oo, we next apply saddle-point integration to solve the 
integrals w.r.t si and S2. After canceling the vanishing terms, 



lim M 1 \ogI M {r ll r 2 \c) 

M— foo 

= -l--^logr l + -extr 

i=l 



log(sis 2 - c 2 



(44) 



and p8| ) follows by solving the extremization, and ( |39] > by 
neglecting the terms that are of the order unity. ■ 

Lemma 2. Let {Oi} 2 =1 be as in Lemma [i] and Aa^ for 
i = \,2 and a = 1, ... ,u as in \21\ . Then, under RS ansatz 



lim Ar 1 log E 0l , 0a {e c ^-i(0 1 A -i Bl ) T (0^i BJ ) \x) 



A I 



„<j[l,2] c[l,2] 
UOj , o 2 o 2 



o[l,2] q[1,1] 



(45) 



= F(5{ 1,1] -S{ 1 ' 9] 

+ ( U -l)F(5i M1 -^ ,„ a 

where c € i? goto? F(ri, r 2 ; c) is g/ve« in ( |38) >. 

Proof: Denote u^f 1 = OiAx^ for all i = 1,2 and 
a = 1, ...,it. Given X, v^f 1 lie on the surfaces of hyper- 
spheres as in the proof of Lemma [T] The RS ansatz guar- 
antees that uf 1 can be expressed as [v^ uf^ ■ ■ ■ uf^] = 
[u^ • • • u^]E T , where {it,[ a '} is a set of vectors that 



satisfies M 1 uJ 



~[b] 



if a ^ b and 



1 [.] [b] _l uS^ + ft 



l.ii 



Si 1 ' 21 ) if a = 6=1; 



if a = 6 > 2. 

(46) 

The matrix £7 = [it^ 1 / 2 !^ e 2 ■ • • e„] provides an orthonor- 
mal basis that is independent of index i. This indicates that the 
expectation in (45) can be assessed w.r.t. {ti[ a '} instead of the 



original non-orthogonal set {u\ }. The orthogonality allows 
us to independently evaluate the expectation for each replica 
index a when u -C M. Using Lemma 1 and ( |4o*l > completes 
the proof. ■ 
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